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(57) Abstract: The methods and systems of the invention involve the generation and use of a cross-linked keyphrase ontology data- 
base. The database is generated by defining at least one keyphrase, representing the keyphrase by a keyphrase node in an ontology, 
cross-linking the keyphrase node to a second keyphrase node, and then repeating the preceding steps for each keyphrase defined. A 
retrievable object can be indexed in a cross-linked keyphrase ontology database by representing the retrievable object by an object 
node in an ontology and then cross-linking the object node to a keyphrase node, where the keyphrase node represents a keyphrase 
in a second ontology and the keyphrase is related to the retrievable object The cross-linked ksyphrase ontology database can be 
searched by parsing a natural language statement into a structured representation and searching the cross-linked keyphrase ontology 
database. The cross-linked ontology database can be used for disambiguating syntactically ambiguous natural language statements. 
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TITLE OF THE INVENTION: METHODS AND SYSTEMS FOR GENERATING AND 
SEARCHING A CROSS-LINKED KEYPHRASE ONTOLOGY DATABASE 

This application claims priority from U.S. Provi^OQal Patent Application Serial No. 
5 60/216^46 filed July 7, 2000. 

Background of the Invention 
With the explosion of information over the last twenty years, it has become very difficult 
for people to find the information they are looking for. The World Wide Web contains well over 

10 one billion web pages^ and even coq)orate databases like large product catalogs, or domain- 
specific databases like Medline, often have many millions of documents, making the search for a 
particular product or piece of information extremely difficult If the searcher does not know the 
exact name, address, or identification number of the item he is trying to find, he must often dig 
through thousands of search results to find relevant infoimation. What is needed is a method for 

15 finding retrievable objects, such as documents, that is easy and provides excellent recall and 
precision. 

Keyword searches over document databases are the most common way searchers find 
documents. A keyword index gives the user the ability, to enter words. If the words are present 
in an indexed document, then the document is returned in the search results. Keyword searches 

20 are prone to both precision or recall errors. Precision errors occur when a search returns objects 
not sought by the user. Recall errors occur when a search fails to return all the existing objects 
souglht by the user. Precision errors result from polys^y and from lack of sfyntactical context. 
For example, if the keywords are "computer'' and "chair," returned elements may well concern 
furniture, computers, and the Chair of the Computer department Recall errors result from 

25 synonymy. "Chair'' for instance, might be used to mean "head of the department," but a relevant 
document might be indexed under the keyword '"chairperson," resulting in failure to match that 
document. . 
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Some keyword search systems use a thesaurus to broaden out search terms and thereby 
reduce recall errors. Since synonym sets in English and other languages overlap considerably, 
however, the use of a thesaurus leads to woree precision. *Blues" for instance, is a synonym for 
"depression" as well as a type of music. Thus a user searching for items related to music may also 
S be returned items related to mood. Boolean syntax, such as "and" and ''or'' searches may also be 
used with common keyword systems to improve precision and recall, but this is beyond the 
abilities of aU but the most sophisticated users. 

Keyword methods have been extended to keyphrase searching by allowing multiple words 
enclosed by quotation marks to be used as alphanumeric strings. This type of keyphrase search 
10 proceeds identically to a keyword search, except that spaces are enclosed within the string being 
sought. Additionally, this type of keyphrase search can improve precision, but it exacerbates 
recall errors, since an exact phrase match is required. 

Keyword methods have also been extended to allow natural language input from users. 
Natural language is language as it is commonly written or spoken, e.g., "I want an Italian leather 
1 5 handbag with a matching wallet." Some natural language systems allow this ^e of input, but 
they generate a keyword search from the substantive words in the input, such as "Italian and 
leather and handbag and matching and wallet." While this makes the search input e'asy for the 
user, since natural language is the most natural way to state a request, by transforming the search 
into a boolean keyword search it discards much of the syntactic information supplied by the 
20 natural language, thus reducmg the relevance of the search results. 

Fujisawa et al. discloses the use of a semantic network to index and retrieve documents. 
(Fujisawa, et al., in U.S. Patent No. 5,555,408). The methods disclosed by Fujisawa et al., 
however, require extensive knowledge engineering effort in deployment. 

Another known interface type allows natural language queries of items which are 
25 annotated to describe their content (Katz et al., U.S. Patent Nos. 5,309,359 and 5,404,295). A 
natural language imderstanding system is used to map natural language queries onto the 
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annotations, and the documents that have matching annotations are returned to the user. The 
axmotation process may be laborious and the quality of results is highly dependent on the 
functioning of the natural language understandmg system. 

This invention addresses the problems of keyword searching, semantic networks, and annotation 
S searches by allowing high precision, high recall natural language searching with minimal 

knowledge engmeering. The objects are indexed in a database of cross-linked keyphrases, which 
also allows disambiguation of the natural language. 



3 
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Summary of the Invention 
The methods and systems of the invention involve the generation and use of a cross-linked 
keyphrase ontology database. A cross-linked keyphrase ontology database is created' by: (a) 
defining at least one keyphrase; (b) representing the keyphrase by a keyphrase node in an 
5 ontology; (c) cross-linking the keyphrase node to at least one second keyphrase node, where the 
second keyphrase node represents a second keyphrase in a second ontology; and (d) repeating 
steps (b) - (c) for each keyphrase delSned in step (a). The keyphrase in step (a) may be generated 
by parsing a text and can be selected from a group consisting of nouns, adjectives, verbs and 
adverbs. In one embodiment, the keyphrase in step (a) and the second keyphrase have at least 
1 0 one word in common. The text parsed may be in English or in any other written or spoken 
language. 

The methods and systems of the invention also allow for indexing a retrievable object in a 
cross-linked keyphrase ontology database. Indexing comprises the steps of: (a) representing the 
retrievable object by an object node in an ontology; and (b) cross-linking the object node to a 

IS keyphrase node, where the keyphrase node represents a keyphrase m a second ontology and the 
keyphrase is related to the retrievable object. In one embodiment, the keyphrase is determined by 
parsmg a text associated with the retrievable object. The retrievable object may be a document, a 
web page, a pointer or an executable computer program. 

The methods and systems of the invention also permit searching of a cross-linked 

20 keyphrase ontology database. Searching comprises the steps of: (a) parsing a natural language 
statement into a structured representation, where the structured representation comprises at least 
one keyphrase; (b) searching the cross-linked keyphrase ontology database for at least one 
object node, where the object node is cross-linked to a keyphrase node representing a second 
keyphrase and where the second keyphrase matches the keyphrase parsed in step (a); and (c) 

25 defining a search result as a retrievable object, wherein the retrievable object is represented by the 



wo 02/05137 PCT/USOl/21459 

object node. The search result can be displayed to a user in a list. The retrievable object may be 
an executable computer program. The natural language statement may be a query. 

In one embodiment, the keyphrase in step (a) and the second keyphrase are identical. In 
another embodiment, the keyphrase in step (a) and the second keyphrase are synonyms. In yet 
5 another embodiment, the keyphrase in step (a) and the second keyphrase are metonyms. 

Searching may be done in a natural language such as English or in any other written or 
spoken language. 

The methods and systems of the invention also permit disambiguating a syntactically 
ambiguous natural language statement. Disambiguation comprises the steps of: (a) parsing the 

10 syntactically ambiguous natural language statemmt into at least two structured representations, 
where the first structured representation comprises at least one first keyphrase and the second 
structured representation comprises at least one second keyphrase; (b) searching a cross- 
hnked keyphrase ontology database for a keyphrase node representing a third keyphrase, where 
the third keyphrase matches the first keyphrase or the second keyphrase; (c) if the first keyphrase 

1 5 matches the third keyphrase and the second keyphrase does not match the third keyphrase, 

designating the first structured representation as a first disambiguated statement interpretation; (d) 
if the second keyphrase matches the third keyphrase and the first keyphrase does not match the 
third keyphrase, designating the second disambiguated structured representation as a second 
statement interpretation; and 

20 (e) if the first keyphrase matches the third keyphrase and the second keyphrase matches the third 
keyphrase, or the first keyphrase does not match the third keyphrase and the second kqrphrase 
does not match the third keyphrase, determining that the syntactically ambiguous natural language 
statement cannot be disambiguated. 

The syntactically ambiguous natural language statement may be a query. In one 

25 embodiment, the third keyphrase is identical to the first keyphrase or the second keyphrase. In 
another embodiment, the third keyphrase is a synonym of the first keyphrase or the second 
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keyphrase, whfle in another embodiment the third keyphrase is a metonym of either the first 

keyphrase or the second keyphrase. Disambiguation may be done on a syntactically ambiguous 

naturd language statement in the English language or in any other spoken or written language. 
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Brief Description of the Figures 

Figure 1 is a diagram illustrating the notations used. 

Figure 2 is a diagram illustrating a cross-linked ke^phrase ontology database. 

Figure 3 is a diagram showing a cross-linking scheme for a three-word keyphrase. 

S Figure 4 is a diagram showing an alternative cross-linldng scheme for a three-word keyphrase. 

Figure S is a diagram illustrating a cross^linked keyphrase ontology database having deeper 

ontologiiBS than hi Figure 2. 

Figure 6 is a diagram showing a verb ontology with cross-linking of keyphrase nodes. 
Figure 7 is a diagram showing an alternate verb keyphrase cross-linking scheme. 
10 Figure 8 is a diagram showing a section of a cross-linked keyphrase ontology database for a shoe 
manufacturer. 

Figure 9a is a diagram illustrating the indexing of retrievable objects from a table. 

Figure 9b is a diagram illustrating the indexing of retrievable objects from a text. 

Figure 10 is a structured representation of a sample query. 
IS Figure 11 is a diagram showing the disambiguation process. 

Figure 12 is a structured representation of a sample keyphrase! 

Figure 13 is an alternate structured representation of the sample keyphrase in Figure 12. 

Figure 14 is a structured representation of a sample keyphrase. 

Figure 1 5 is an alternate structured representation of the keyphrase in Figure 14. 
20 Figure 1 6 is a diagram showing the system of the inventiojo. 

Figure 17 is a structured representation of a sample query. 

Figure 18 is a truncated structured representation of the sample queiy of Figure 17. 

Figure 19 is a second truncated structured representation of the sample query of Figure 17. 
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Detailed Description of the Invention 
Figure 1 illustrates the terms used in the figures. Two ontologies 1.01 and L02 are 
shown, where an ontology is a set of nodes linked by inheritance Imks 1.06, 1.07 and 1.13. 
Inheritance links 1 .06, 1 .07 and 1 . 13 are shown on this and subsequent figures as solid lined 
S aiTOws, which originate at a parent node and terminate at a child node. The parent of a given 
node 1 .03 is a node firom which an inheritance link 1 .06 that terminates on that given node 1 .08 
originates. The child of a given node 1.08 is a node on which an inheritance link 1 .06 that 
originates from that given node 1.03 termmates. Like family trees, all of a node's parents, and its 
parent's parents, and so on, recursively, form the node's ancestors, and all of a node's children, 
10 and its children's children, and so on, recursively, form the node's descendants. Inheritance 
means that if a node is the recipient of a cross-link, then any descendant from that node is also a 
recipient of the cross-link. In Figure 1, for example, keyphrase node 1 .08 ioherits a cross-Unk to 
keyphrase.node l.OS, and the object node L14 inherits cross-links to both keyphrase node l.OS 
and keyphrase node 1.10. 

15 A node is in the same ontology as a second node if either of the nodes is an ancestor of the 

other node, or if the nodes share a common ancestor node. For example, in Figure 1, node 1.03 
and node 1 . 14 are in the same ontology 1 .01 because node 1 .03 is an ancestor of node 1 . 14 
through inheritance links 1.13 and 1.06. Node 1.08 and node 1.14 are in the same ontology 1.01 
because (i) they share the same ancestor node 1.03 and (ii) node 1.08 is an ancestor of node 1.14 

20 through inheritance link 1.13. Node 1 .05 is in a different ontology from node 1.14 since node 
1 .05 is not an ancestor of node 1.14, node 1 . 14 is not an ancestor of node 1 .05, and there are no 
nodes which are ancestors of both node 1.14 and 1.05. 

Cross-links 1 .04 and 1.09 are shown in this and subsequent figures as broken-line arrows, 
which originate at the node that supplies the keyphrase (e.g., keyphrase node 1.05), and terminate 

25 at the node which receives the keyphrase (e.g., keyphrase node 1 .03). Cross-link terminations (or 
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cross-link redpient status) are inherited in each ontology. As used herein, the term node may 
refer to keyphrase nodes or object nodes. 



Cross-linked Keyphrase Ontology Database 
S The methods of the invention involve the generation and use of a cross-linked keyphrase 

ontology database. A cross-linked keyphrase ontology database is created by: (a) defining at least 
one keyphrase; Q)) representing the keyphrase by a keyphrase node in an ontology; (c) cross- 
linking the keyphrase node to at least one second keyphrase node, wherein the second keyplirase 
node represents a second keyphrase in a second ontology; and (d) repeating steps (b) - (c) for 

1 0 each keyphrase defined in step (a). The keyphrase in step (a) may be generated by parsing a text 
and can be selected firom a group consisting of nouns, adjectives, verbs and adverbs. In one 
embodiment, the keyphrase in step (a) and the second keyphrase have at least one word in 
common. The text parsed may be in English or in any other written or spoken language. 

As shown in Figure 1, a cross-linked keyphrase ontology database is a database in which 

IS objects are represented as object nodes 1.14 attached to cross-linked ontologies 1.01 and 1.02. 
Ontologies of keyphrases 1.01 and 1.02 are stored in the keyphrase domain 1.11 which contains 
keyphrase nodes 1 .03, 1 .OS, 1 .08 and 1.10, while particular objects that might be retrieved are 
stored in the object domain 1.12 which contains object nodes 1.14. Keyphrase nodes 1.03, l.OS, 
1 .08 and 1 . 1 0 are nodes that, together with their inheritance links 1 .06, 1 .07 and 1 . 13 and cross- 

20 links 1 .04 and 1 .09, represent k^hrases. Object nodes 1. 14 are nodes that represent at least one 
retrievable object, such as pages, web pages, files, documents, product or business names, 
descriptions, information, or commands. A command can be an executable computer program. 
For example, a cormnand might be a script that launches a computer program. In many 
applications, the command is executed when the object node is returned in the result set of a 

25 query. For example, the query by a user *Vhat is my checking account balance,*' might result m 
an object node that executes a sequence of commands that first ascertains the user's checking 
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accoxint number, accesses a database to determine the accoxmt balance, and then displays the 
accoimt balance to the user. 

As seen in Figure 1, the object nodes 1. 14 are part of at least one ontology (e.g., Ontology 
A f .01 in Figure 1). Object nodes 1.14 may contain the retrievable object directly, or they may 
5 contain a pointer to the retrievable object which allows the object to be recovered if it is returned 
as part of a search result. The pointer may be a file path, or if the retrievable object is a web page, 
the pomter may be Unifopn Resource Locator (URL). 

Keyphrases stored in the keyphrase domain 1. 1 1 are arranged in ontologies 1.01 and 1.02. 
The ontologies 1.01 and 1.02 are used to define the inheritance of cross-links 1.04 and 1.09, and 

10 taken together, inheritance links 1.06, 1.07 and 1.13 and cross-links 1.04 and 1.09 form 

keyphrases. A keyphrase is an ordered series of one or more words, which may contain nouns, 
verbs, adjectives and adverbs. Two-word keyphrases are stored in the keyphrase domain as 
cross-linked keyphrase nodes (e.g. 1.03 and l.OS), or as ontology intersections. Ah ontology 
intersection is a node connected by inheritance links to more than one ontology. As shown in 

IS Figure 1, cross-links 1.04 and 1.09 are directional, with origins (keyphrase nodes) l.OS and 1.10 
(arrow tail) and recipients (keyphrase nodes) 1 .03, 1 .08, and 1.14 (arrow head). The origin 1 .05 
and 1.10 of a cross-Unk 1.04 and 1.09 is a keyphrase node that represents a keyphrase. The 
recipient 1.03, 1.08 and 1.14 of a cross-link 1.04 and 1.09 is a keyphrase node that represents a 
keyphrase and/or a retrievable object or may have descendants which are object nodes 

20 representing retrievable objects. If the recipient node represents a keyphrase and has no 

descendants that are object nodes, the keyphrase which the origin of the cross-link represents will 

be part of the keyphrase the recipient represents. If the node that receives a cross-link 1.03, 1.08 

and 1.14 represents a retrievable object or has descendants which are object nodes, as in Ontology 

A 1.01, the keyphrase which the origin nodes 1.05 and 1.10 represent may be a keyphrase by 

25 which the retrievable object or the set of object nodes descendant firom the recipient is to be 

|0 
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matched, rather than just a sub-phrase or keyphrase represented by the recipient node 1 .03, 1 .08 

and 1,14 keyphrase. 

This invention is illustrated in the specific examples which follow. These sections set forth 
below the understanding of the invention, but are not intended to, and should not be construed to, 
S limit in any way the invention as set forth in the claims which follow thereafter. 

These points are illustrated by Figure 2, which shows a keyphrase domain 2.24 and an 
object domain 2.26 for a database used to index restaurants. The keyphrase domain shown in 
Figure 2 has four ontologies, one for restaurants (which are retrievable objects) 2.01, one for food 
types 2.02, one for nationalities 2.03 and one for meat 2.04. As shown in Figure 2, the restaurant 

10 ontology 2.01 contains two keyphrase nodes 2.05 and 2. 14, representing the keyphrases 
''restaurant" and 'Italian restauranf respectively, firom which an object node representing a 
retrievable object descends. The food ontology 2.02 shown in Figure 2 has three keyphrase nodes 
2.06, 2.15 and 2.23, representing the keyphrases 'Tood," "Italian food," and "lamb Napoletana", 
respectively. The nationaUty ontology 2.03 shown in Figure 2 contains two keyphrase nodes 2.07 

15 and 2. 16, representing the keyphrases "regional" and "Italian", respectively. The meat ontology 
2.04 contains three keyphrase nodes representing the keyphrases "meat", "lamb" and "lamb 
Napoletana," respectively. The object domain 2.26 as shown in Figure 2 includes just one 
keyphrase node 2.27 representing a retrievable object, "Beppo*s Restaurant". The keyphrase 
node 2. 14 representing the keyphrase "Italian restaurant" is the recipient of a cross-link 2. 13 fi-om 

20 a keyphrase node 2.16 representing the keyphrase "Italian", which is part of keyphrase "Italian 
restaurant" (keyphrase node 2. 14), and also is the recipient of a cross-link 2. 1 8 firom a keyphrase 
node 2.15 representing the keyphrase "Italian food", which is a keyphrase by which the object 
node 2.27 descendant from the keyphrase "Italian restaurant" (keyphrase node 2.14) can be 
matched. The keyphrase node 2. 1 5 representing the keyphrase "Italian food" 2. 15, by contrast, 

25 is only the recipient of a cross-link 2. 1 9 from a keyphrase node 2. 1 6 representing the keyphrase 
"Italian," which is a part of the keyphrase, it represents "Italian food" (keyphrase node 2. 1 5). 
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Cross-liiiks between keyphrase nodes in a cross-linked keyphrase ontology database can 

be used to represent syntactic relations inherent in keyphrases. For example, the keyphrase 
"Italian food" (keyphrase node 2. 1 5) is represented in the cross-linked keyphrase ontology 
database shown in Figure 2 as a keyphrase node 2, 15 cross-Unked 2. 19 to another keyphrase 
S node 2.16. It has the parent keyphrase node 2.06 representing "Tood" and is modified by the 
keyphrase "Italian" (keyphrase node 2. 16), which exists in a different ontology 2.07. The cross- 
linked keyphrase node 2. 15 representing the keyphrase ^Italian food" corresponds to a ^e of 
keyphrase food (keyphrase node 2.06) modified by the keyphrase "Italian" (keyphorase node 2. 16). 
The keyphrase "lamb Napoletana" (keyphrase node 2.23) is stored in the database shown in 

10 Figure 2 as an ontology intersection. It has a parent keyphrase "Italian food" (keyphrase node 
2.15) and a parent keyphrase "lamb" (keyphrase node 2.17) each from a different ontology 2.02 
and 2.04. Three or more word keyphrases can be represented in the keyphrase domain 2.24 by 
cross-links or mtersections with nodes representing keyphrases with fewer words. 

Figure 3 shows a possible keyphrase domain of a cross-linked keyphrase ontology 

1 5 database, which contains three ontologies, for nationality, meat, and for sandwiches. The 

nationaUty ontology contains jiist two keyphrase nodes 3.01 and 3.07, the meat ontology contains 
three keyphrase nodes 3 .02, 3 .08 and 3 . 13, and the sandwich ontology contains just two 
keyphrase nodes 3 .03 and 3.12. Keyphrase nodes in each ontology are joined by inheritance links 
3.04, 3.05, 3.06 and 3.10. Figure 3 shows the representation of the keyphrase "Italian salami 

20 sandwich" (keyphrase node 3. 12). "Italian" (keyphrase node 3.07) modifies "salami" (keyphrase 
node 3.08), not "sandwich" (keyphrase node 3,03), so the two word keyphrase 'Italian salami" 
(keyphrase node 3.13) is represented by an inheritance link 3. 10 to the keyphrase node 3.08 
representing the keyphrase "salami" and cross-linked 3.09 to the keyphrase node 3.07 
representing "Italian." The keyphrase "Italian salanod sandwich" (keyphrase node 3.12) can then 

25 be represented by an inheritance link 3.06 to the keyphrase node 3.03 representing the keyphrase 
"sandwich" 3.03 which is cross-linked 3. 11 to a keyphrase node 3. 13 representing the keyphrase 
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'Italian salanu/' Three or more word keyphrases can also be represented in the keyphrase domain 
by means of multiple cross-links, possibly in combination with ontology intersections. 

Figure 4 shows a representation in a cross-linked keyphrase ontology database of the 
example keyphrase "open-faced salami sandwich" (keyphrase node 4. 1 1). The keyphrase "open- 
5 faced'* (keyphrase node 4.08) modifies "sandwich" (keyphrase node 4.02), not "salami" 

(keyphrase node 4.05), so the keyphrase "open-faced salami sandwich" (keyphrase node 4. 1 1) can 
be represented by an inheritance link 4.09 to the keyphrase node 4.06 representing the keyphrase 
"open-faced sandwich" which is cross-linked 4.10 to a keyphrase node 4.05 representing the 
keyphrase "salami." The keyphrase node 4.06 representing the keyphrase"open-faced sandwich" 
10 can be represented by an inheritance link 4.04 to the keyphrase node 4.02 representing the 
keyphrase "sandwich," which cross-linked 4.07 to the keyphrase node 4.08 representing the 
keyphrase "open-faced." As in the case of two word keyphrases, representations of multi-word 
keyphrases follow syntactic linkages in the phrases themselves. 

Keyphrase nodes in a keyphrase domain can be described by the keyphrases they represent 
15 or by other keyphrases. The following rules determine the keyphrases with which a keyphrase 
node can be described. Aside fi:om the keyphrase which it represents, the set of keyphrases which 
can be used to describe a keyphrase node include: 

(I) the names of its ancestors in the keyphrase domain ontology(ies) to which it is 
attached by inheritance links; and 
20 QI) keyphrases formed by concatenating a first and second keyphrase, in which the 

second element is detennined by rule I and the first element is either n(a) the name of a 
keyphrase node in another ontology, fi-om which it receives a cross-link, either directiy or by 
inheritance fi-om its ancestors, or n(b) the name of a keyphrase node ancestral to a keyphrase 
node in another ontology from which it receives a cross-link, directly or by inheritance. 
25 In Figure 2, for example, the keyphrase node 2.23 which represents "lamb Napoletana" 

can be described, by rule I, by the keyphrase "lamb" 2.17, and by rule 11(a) by the keyphrase 
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"Italian lamb," which is fonned by concatenating "Italian" 2, 16 with "lamb" 2. 17. The keyphrase 
node 2.23 which represents "lamb Napoletana" can also be described, by rule n(b), by the 
keyphrase "regional lamb," which is formed by concatenating "regional" 2.07 with "lamb" 2. 17. 

Keyphrase nodes in a keyphrase domain can be described by the keyphrases they represent 
or by other keyphrases. The following rules determine the keyphrases with which a keyphrase 
node can be described. Aside from the keyphrase which it represents, the set of keyphrases which 
can be used to describe a keyphrase node include: 

Q) the names of its ancestors in the keyphrase domain ontology(ies) to which it is 
attached by inheritance links, 

(n) keyphrases formed by concatenating a first and second keyphrase, in which the 
second element is determined by rule I and the first element is either n(a) the name of a 
keyphrase node in another ontology, from which it receives a cross-link, either directly or by 
inheritance from its ancestors, or 11(b) the name of a keyphrase node ancestral to a keyphrase 
node in another ontology from which it receives a cross-link, directly or by inheritance. 

In Figure 2, for example, the keyphrase node 2.23 which represents "lamb Napoletana" 
can be described, by rule I, by the keyphrase 'lamb" 2. 17, and by rule 11(a) by the keyphrase 
"Italian lamb," which is fonned by concatenating "Italian" 2. 1 6 with "Iamb" 2. 17. The keyphrase 
node 2.23 which represents "lamb Napoletana" can also be described, by rule n(b), by the 
keyphrase "regional lamb," which is fonned by concatenating "regional" 2.07 with "lamb" 2.17. 

The following rules determine the set of keyphrases linked to an object node (and hence, 
to the object it represents) in the object domain of the cross-linked keyphrase ontology database. 
The set of keyphrases linked to an object node (and hence to the object it repres^s) in the object 
domain include: 

(i) the names of its ancestors in the keyphrase domain ontology(ies) to which it is 
attached by inheritance links, and 
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(ii) the names of the keyphrase nodes in other ontologies frona which it receives cross- 
links, either directly or by inheritance from its ancestors, and . 

(iii) the additional keyphrases, by rules (i) and (ii) above, by which keyphrase nodes from 
which it receives cross-links, directly or by inheritance, can be described. 

5 In Figure 2, for example, by rule (i) the object "Beppo's restaurant," which is represented 

by an object node 2.27, is linked to the keyphrase "restaurant" (keyphrase node 2.05); by rule (ii) 
the object "Beppo's restaurant," which is represented by an object node 2.27, is linked to the 
keyphrase "Lamb Napoletana" (keyphrase node 2.23); and, by rule (iii) the object "Beppo^s 
, restaurant," which is represented by an object node 2.27, is linked to the keyphrase 'Italian lamb." 

1 0 For matching an object node in a cross-linked ontology database with an object node in a 

structural representation for searching (see below), an object node linked with a keyphrase node 
representing a keyphrase defined by rule 3 is considered cross-linked to a keyphrase node 
representing that keyphrase. 

Once a keyphrase descriptive of a set of retrievable objects in the object domain has been 

1 S represented in the keyphrase domain, then it can also receive cross-links from ke]rphrase nodes in 
other ontologies representing keyphrases with which the set of objects may be associated, and 
: which might therefore be spoken or written by users looking for objects in the relevant retrievable 
set. In Figure 2, for example, the keyphrase node 2. 14 representing the keyphrase "Italian 
restaurant" receives a cross-link 2. 18 from the keyphrase node 2. 15 in the food ontology 2.02 

20 representing the keyphrase "Italian food." Note that the keyphrase "Italian food" has no specified 
syntactic or predicate relation to the keyphrase 'Italian restaurant" (keyphrase node 2. 14), but 
that the cross-link 2. 18 serves only to link a keyphrase to descendants of the keyphrase node 2. 14 
representing keyphrase 'Italian restaurant". 

As the depth of ontologies in a cross-linked keyphrase ontology database grows, where 

25 depth is the number of levels of the average ontology in the database, the nxmib^ of keyphrases 
attached to any retrievable object, and hence, the recall capabilities of the system, increase 
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accordingly. This is illustrated by Figure 5, which shows the results of adding one more layer of 
depth to the restaurant, food and nationality ontologies previously shown in Figure 2. Figure 5 
shows a keyphrase domain 5.33 and an object domain 535 for a database used to index 
restaurants. The keyphrase domain 5.33 shown in Figure 5 has four ontologies, one for 
5 . restaurants (which are retrievable objects) 5.01, one for food types 5.02, one for nationalities 
5.03, and one for meat 5.04. As shown in Figure 5, the restaurant ontology 5.01 contains three 
keyphrase nodes representing the keyphrases "restaurant" 5.05, "Italian restaurant" 5. 14, and 
**Neapolitan restaurant* * 5.24, from which the object node 5.36 representing "Beppo's restaurant" 
descends. The food ontology 5.02 shown in Figure 5 has four keyphrase nodes representing the 

10 keyphrases "food" (keyphrase node 5.06), "Italian food" (keyphrase node 5.15), "Neapolitan 
food" (keyphrase node 5.25), and "lamb Napoletana" (keyphrase node 5.31). The nationality 
ontology 5.03 shown in Figure 5 contains three keyphrase nodes representing the keyphrases 
"regional" (keyphrase node 5.07), "Italian" (keyphrase node 5.16), and '^Neapolitan" (keyphrase 
node 5.26). The meat ontology 5.04 contains three keyphrase nodes representing the keyphrases 

15 "meat" (keyphrase node 5.08), "lamb" (keyphrase node 5.17), and "lamb Napoletana" (kq^hrase 
node 5.31). The object domain 5.35 as shown in Figure 5 includes just one object node 5,36 
representing a retrievable object, keyphrase "Beppo's Restaurant." In Figure 5, the keyphrase 
nodes representing the keyphrases "Italian restaurant" (keyphrase node 5. 14), "Italian food" 
(keyphrase node 5.15), "Italian" (keyphrase node 5.16), "Lamb Napoletana" (keyphrase node 

20 5.3 1) and the object node representing the keyphrase ^'Beppo's restaurant" (keyphrase node 
5.36), are cross-lmked with each other in the same way as shown in Figure 2. 

The diflference between Figure 5 and Figure 2 is that: (i) the keyphrase '^Neapolitan 
restaurant" (keyphrase node 5.24) has been added to the restaurant ontology 5.01; (ii) 
'^Neapolitan food" node 5.25 has been added to the food ontology 5.02; and (iii) the keyphrase 

25 "Neapolitan" (keyphrase node 5.26) has been added to the nationality ontology 5.03, Following 
the rules described above, for determining which keyphrases are linked to an object represented 
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by a node in the object domain, as the result of the changes reflected in Figure 5, '*Beppo's 

restaurant" (object node 5.36) is linked with the additional keyphrases ^^Neapolitan restaxirant" 

Qceyphrase node 5.24), "NeapoUtan food" (keyphrase node 5.25), '^Neapolitan" (keyphrase node 

5.26), as well as others which us^s are less likely to enter (e.g., 'Italian Neapolitan restaurant"). 

5 The numbers of keyphrase cross-links associated with any given retrievable object increases 

combinatorialiy with increased ontology depth, due to cross-link and inheritance patterns. 

Keyphrase nodes corresponding to keyphrases in the keyphrase domam may also labeled 

with synonyms or metonyms to facilitate the search process. A keyphrase node in the keyphrase 

domain corresponding to "automobile," for example, can also be labeled with the synonym "car." 

1 0 Synonyms with which keyphrase nodes are labeled may also include non-standard English (e.g., 
"bbq" for "barbecue"), non-EngUsh equivalents (e.g., ''Napoletana" for "Neapolitan"), or even 
variant spellings of the same word (e.g., "barbeque" for "barbecue"). A keyphrase node in the 
keyphrase domain corresponding to "dining" in a restaurant database may also be labeled with the 
metonym **table." Although "diningf' and "table" are not synonymous, users may speak or write 

15 the word "table" in sentences in which they mean "dining" (e.g., "a restaurant with outdoor 

tables" ratha* than "a restaurant with outdoor dining")- Unlike synonyms, metonyms are highly 
domain dependent. "Table," for instance, is not a metonym for "dining" in a furniture domain, 
where "dining tables" are known and are distinctive from other tables. Keyphrases can be in any 
natural language, including English. 

20 The ontologies shown in Figures 2 and 5 are noim and adjective ontologies. Verb 

ontologies can also be created and cross-linked and joined to adverb, nom and adjective 
ontologies. Figure 6 shows an example ontology for verbs which correspond to various ways of 
"going." As shown in Figure 6, nodes 6.09-6. 12 and 6. 17-6. 19 representing specific ways of 
"going" connected by inheritance links 6.04-6.07 and 6. 14-6. 16 to a node 6.02 representing "go" 

25 in general. A keyphrase node 6.01 representing the keyphrase "quickly" is cross-lmked 6.08 with 
a child 6.21 of "jog** to represent the verbal keyphrase "quickly jog" ("quickly jog" is a child of 
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"jog" by virtue of the inheritance link 6.20 which coimects keyphrase nodes 6.18 and 6.21), The 
keyphrase node 6.01 corresponding to the keyphrase "quickly" is shown as a single keyphrase 
node. A child 6.23 of a keyphrase node 6.03 representing "mile," also shown here as a single 
keyphrase node, is cross-linked 6.22 to the keyphrase node 6.21 representing the keyphrase 
"quickly jog," to represent the three-word verbal keyphrase "quickly jog (a) mile" 6.23. Figure 6 
shows a schema for representing veibal keyphrases which assign head word status to the noun 
syntactic object ("mile" in this case). CJonceptually, this is equivalent to the three-word keyphrase 
representing a "mile (that is) quickly jogged." 

Verbs can also function as head words, in which cases adverbs and some or all of their 
syntactic arguments can be attached to them. Figure 7 shows the same example ontology for 
verbs which correspond to various ways of "going" as shown in Figure 6. Nodes 7.09-7.12 and 
7.17-7. 19 representing specific ways of "going" connected by inheritance links 7.04-7.07 and 
7.14-7.16 to a node 7.02 representing "go" in general. Figure 7 also shows a node 7.01 
representing the keyphrase "quickly," and a node 7.03 representing the keyphrase "mile." Figure 
7 shows how the three-word keyphrase "quickly jog (a) mile" could be represented by a 
keyphrase node 7.2 1 descended from the keyphrase node 7. 1 8 corresponding to "jog." The 
choice of these or other schemes for cross-linking nouns and verbs depends on properties of the 
database domain and can be chosen for reasonis of convenience, as long as one scheme is carried 
through consistently m deploying this invention. 

In general, a cross-linked keyphrase ontology database is a database in which: 

(a) keyphrases are represented as keyphrase nodes in ontologies, each ontology having as 
many keyphrase nodes (and as-great a depth) as necessary to represent a domain; 

(b) keyphrases may be generated by parsing a text; 

(c) keyphrases are represented as intersections of ontologies, or by cross-linking a 
keyphrase node descendant from one or more ontology(ies) to keyphrase nodes belonging to 
other ontologies, or any equivalent representations; 
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(d) keyphrases may include one or more words in common; 

(e) cross-links are inherited through ontologies; 

(f) given the rules of inheritance, cross-links are created to relate all descendants of a 
recipient keyphrase node with appropriate keyphrases, given the data domain; and 

5 (g) retrievable objects are represented by object nodes descendant from at least one 

keyphrase node in the keyphrase ontologies and possibly cross-linked directly (rather than by 
inheritance) with one or more keyphrase nodes in the keyphrase ontologies. 



10 

Indexing Retrievable Objects 

The process of indexing retrievable objects, including docum^ts, web pages^ pointers and 
executable computer programs, in the object domain is the process of linking the object nodes 

IS with keyphrase nodes in the keyphrase domain by inheritance links and cross-links. Generally, the 
method of indexing retrievable objects involves the following steps: (a) representing the 
retrievable object by an object node in an ontology; and (b) cross-linking the object node to a 
keyphrase node, where the keyphrase node represents a keyphrase in a second ontology and the 
keyphrase is related to the retrievable object. In one embodiment, the keyphrase is determined by 

20 parsing a text associated with the retrievable object. The retrievable object may be a document, a 
web page, a pointer or an executable computer program. This can be readily achieved by indexers 
with graphical and command line tools, or can be achieved automatically, using a natural language 
understanding device, or parser, or a relational database mterface. For a particular object, 
indexers can simply anticipate, using their knowledge of the particular domain, keyphrases that 

25 others may use in searching for an item like the object being indexed. These keyphrases are 

therefore related to the objects being indexed. If the object, for example, is a peach running shoe, 

the indexer might anticipate that the keyphrases "peach" and "running shoe" might be produced 

by users seekmg a similar item. By creating an inheritance link between the object node 

. 1^ 
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representing the object and a node representing "running shoe" in a shoe ontology, and a cross- 
link from the object node to a node representing "peach" in a color ontology, the indexer can 
insure that users whose input produces, when processed by a natural language understanding 
system the keyphrases "peach" and "running shoe" will be returned the object node currently 
5 being indexed. Figure 8 shows how a cross^linked keyphrase ontology database might be 

constructed for such a shoe domain. As shown in Figure 8, the keyphrase domain 8.19 contains a 
shoe ontology comprising two keyphrase nodes 8.01 and 8.07 and a color ontology comprising 
five keyphrase nodes 8.02, 8.09, 8.10, 8.14 and 8. IS. In Figure 8, additional keyphrase nodes are . 
shown representing "ruxming" (keyphrase node 8.06) and "light-weight" (keyphrase node 8.08), 

10 but are not shown in ontologies. An object node 8.21 in the object domain 8.20 represents a 
particular shoe. Shoe #34 (object node 8.21), which is a child of the keyphrase node 8.07 
representing the keyphrase "running shoe." Shoe #34 (object node 8.21) is cross-linked 8. 17 and 
8. 18 to keyphrase nodes 8.08 and 8.14 representing the keyphrases "light-weight" and "peach," 
respectively as well as to a keyphrase node 8.06 representing the keyphrase "running," by 

IS inheritance from its parent keyphrase node 8.07. Other keynodes 8. IS and 8.10 represent other 
possible cross-links or inheritances that are found in the cross-linked keyphrase ontology 
database. 

Figure 9a shows the process of indexing Shoe #34 (object node 8.21) from data coming 
from a relational database or table of information. The upper part of Figure 9a replicates the 

20 keyphrase domain of the cross-linked ontology database shown in Figure 8 used to index shoes. 
The keyphrase domain 9.16 contains a shoe ontology comprising two keyphrase nodes 9.01 and 
9.07 and a color ontology comprismg five keyphrase nodes 9.02, 9.09, 9.10, 9.14 and 9.1S. In 
Figure 9,' additional keyphrase nodes are shown representing "running" (keyphrase node 9.06) and 
"light-weight" (keyphrase 9.08), but are not shown in ontologies. 

2S As Figure 9 shows, a table 9.26 containing information about Shoe #34 (object node 8.21, 

also shown here as 9.23) is processed by a relational database interface 9.2S to generate a 
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Structured representation 9.24 of Shoe #34 (object node 9.23). The table 9.26 shows attributes of 
Shoe #34 (object node 9.23) and therefore keyphrase nodes generated from table 9.26 are related 
to Shoe #34 (object node 9.23). The table 9.26 indicates that Shoe #34 (object node 9.23) is 
identified 9.27 by "#34" 9.3 1, the type of item 9.28 is a "running shoe" 9.32, its color 9.29 is 
5 "peach" 9.33, and a description 9.30 is that it is "Ughtweight" 9.34. The relational database 
interface 9.25 allows an indexer to specify whether values found in a column in a relational 
database should be linked to the object node being indeed by an inheritance link or a cross-link. . 
The structured representation 9.24 shows that the object node 9.23 that represents the keyphrase 
Shoe #34 is connected by an inheritance link to the keyphrase node 9. 17 that represents "running 
10 shoe" and is cross-linked 9.21 and 9.22 to keyphrase nodes 9.18, 9.19, respectively, that represent 
the keyphrases "peach" and "light-weight." The structured representation 9.24 is then linked to 
the keyphrase domain of the cross-linked keyphrase ontology by linking the keyphrase nodes in 
the structured representation 9.24 to keyphrase nodes that represent the same keyphrases (or 
synonymous keyphrases) in the keyphrase domain 9. 16. Thus the object node representing the 
15 keyphrase "Shoe #34" (object node 9.23) is connected by an inheritance link to "running 

shoe"(keyphrase node 9.07), and it is cross-linked to the keyphrase node 9. 14 representing the 
keyphrase "peach" and the keyphrase node 9.08 rq>resenting the keyphrase "light-weight." 

Figure 9b shows how the same information can be taken from a text that describes Shoe 
#34 (object node 8.21). Because the text is about Shoe#34 keyphrases derived from the text are 
20 related to Sho^34. The upper part of Figure 9a replicates the keyphrase domain of the cross- 
linked ontology database shown in Figure 8 used to index shoes; The keyphrase domain 9.56 
contains a shoe ontology comprising two keyphrase nodes 9.41 and 9.47 and a color ontology 
comprising five keyphrase nodes 9.42, 9.49, 9.50, 9.54 and 9.55. In Figure 9b, additional 
keyphrase nodes 9.46, 9.48, respectively, are shown representing the keyphrases "running" and 
25 "light-wdght", but are not shown in ontologies. 



wo 02/05137 PCT/USOl/21459 

Parts of the text 9.66 are processed with the natural language iinderstanding device 9.65 
to create a structured representation 9.54 of some of the information contained in the text 9.66. 
Parsing systenoLs, or more generally, language understanding systems, that produce structured 
representations of natural language input using rules of syntax and grammar are well known (See 
5 Allen, J., Natural Language Understanding (Me^vio Park, Calif: Benjamin-Cunamiiigs, 1995), 
which is incorporated herein in its entirety by reference). In the example shown, the natural 
language understanding device 9.65 has generated the structured representation showing the 
object node Shoe #34 (object node 9.63) is a child of the node that represents "running shoe" 
(keyphrase node 9.57) and is cross-linked 9.61 and 9.62 to keyphrase nodes that represent 

10 "peach" (keyphrase node 9.58) and "light-weight" (keyphrase node 9.59). The structured 

representation 9.54 is then linked to the. k^hrase domain of the cross-lmked keyphrase ontology 
by Hnking the object node representmg Shoe #34 (object node 9.63) to keyphrase nodes that 
represent the same keyphrases (or synonymous keyphrases) in the keyphrase domain 9.56. Thus 
the object node representing "Shoe #34" (object node 9.63) is connected by an inheritance link to 

15 "runnmg shoe" (keyphrase node 9.47), and it is cross-linked to the keyphrase node representing 
"peach" (keyphrase node 9,54) and the node representing "light-weighf * (keyphrase node 9.48). 

Searching for Retrievable Objects 

The methods and systems of the invention also permit searching a dross-linked k^hrase 

20 ontology database. Searching comprises the steps of:(a) parsing a natural language statement into 

a structured representation, where the structured representation comprises at least one keyphrase; 

(b) searching the cross-linked keyphrase ontology database for at least one object node, 

where the object node is cross-linked to a keyphrase node representing a second k^hrase, where 

the second keyphrase matches the keyphrase parsed in step (a); and (c) defining a search result as 

25 a retrievable object, wherem the retrievable object is represented by the object node. The search 

25^ 
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result can be displayed to a user in a list. The retrievable object may be an executable computer 
program. The natural language statement may be a queiy. 

In one embodiment, the keyphrase in step (a) and the second keyphrase are identical. In 
another embodiment, the keyphrase in step (a) and the second keyphrase are synonyms and in 
S another embodiment, the keyphrase in step (a) and the second keyphrase are metonyms. 

Searching is done by converting an input query into a structured representation, and then 
finding object nodes in the cross-linked keyphrase ontology database that match the structured 
representation. The natural language understanding device constructs keyphrases fi-om a natural 
language input query, and determines the structured representation of the query based on rules of 
10 syntax and grammar, and by disambiguation using the cross-linked keyphrase ontology database. 
The keyphrase "nmning shoes," for example, may appear in an input sentence (e.g. "I want 
running shoes'"), and may correspond to a keyphrase node, and hence a keyphrase, in a cross- 
linked keyphrase ontology database. However, the input may have taken the forms ^1 want shoes 
for running," "I want shoes to use for nmning," or others, in which the keyphrase "running shoes" 
1 5 does not appear. The natural language understanding device serves to retrieve the keyphrase 
"running shoes" from as many of these variant request constructions as possible. 

This methods and systems of this invention are not, however, limited by a particular 
method of constructing structured representations. Other methods which may be used to form 
such representations are described in Allen, J., Natural Language Understanding (Menlo Park, 
20 Calif: Benjamin-Cunamings, 1995). 

In the example shown, the cross-linked keyphrase ontology database illustrated in Figure 8 
has been set up and a user enters the query "I want a yellow running shoe." Figure 10 shows a 
structured representation of the object node 10.03 the query specifies based on the syntax of the 
query sentence. As shown in Figure 10, the object node 10.03 specified in the query will be a 
25 descendant of a keyphrase node 10.01 representing the keyphrase "shoe" and will be cross-linked 

10.04 and 10.06 to keyphrase nodes representing the keyphrases "yellow" (keyphrase node 10.05) 

3.3 
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and "running"' (keyphrase node 10.07). In one embodiment of this invention, the structured 

representation shown in Figure 10 also comprises keyphrases formed by ordered series of shorter 

keyphrases 10.01, 10.05 and 10.07, such as "yellow shoe" or "nmning shoe.*' 

The directory database of this invention, illustrated in Figure 8, can be searched to find 

every retrievable object cross-linked with the keyphrases "shoe" (keyphrase node 8.01), ^'yellow" 

(keyphrase node 8.09), "running" (keyphrase node 8.06), or "running shoe" (keyphrase node 

8.07), which are some of the keyphrases comprised by the structured representation shown in 

Figure 10. In the case of Figure 8, Shoe #34 (object node 8.21) is returned because: 

1) The keyphrase Shoe#34 (object node 8.21) is a descendent of "running shoe" 
(keyphrase node 8.07), and therefore is cross-linked with the keyphrase "running shoe" 
(keyphrase node 8.07);" and 

2) The keyphrase Shoe#34 (object node 8.21) is cross-linked with the keyphrase '*yeUow" 
. (keyphrase node 8.09), because the keyphrase "peach" (keyphrase node 8.14) is a 

descendant of the keyphrase "yellow" (keyphrase node 8.09) in the color ontology. 
Alternatively, the keyphrase Shoe #34 (object node 8.21) could have been returned because: 

1) The keyphrase Shoe #34 (object node 8.21) is a descendant of the keyphrase "shoe" 
(keyphrase node 8.01), and therefore is cross-linked with the keyphrase "shoe" (keyphrase 
node 8.01); 

2) The keyphrase Shoe #34 (object node 8.21) is a descendant of the kqrphrase "runnmg 
shoe" (keyphrase node 8.07), and therefore inherits the keyphrase "running" (keyphrase 
node 8.06); and 

3) The keyphrase Shoe#34 (object node 8.21) is cross-linked with the keyphrase '^yellow" 
(keyphrase node 8.09), because "peach" O^eyphrase node 8. 14) is a descendant of the 
keyphrase "yellow" (keyphrase node 8.09) in the color ontology. 

This illustrates the process of matching an object node 10.03 in a structured representation 
(Figure 10) with an object node 8.21 in a cross-linked keyphrase ontology database (Figure 8). 
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The match occurs where the object node in the cross-linked keyphrase ontology database is linked 
with the same keyphrases as the object node in the structured representation according to the 
rules by which keyphrases are linked to object nodes. The match described here is one in which 
keyphrases from the structured representation of user input match identically to the keyphrases 
S cross-linked to the object node 8.21 representing the keyphrase Shoe #34 (object node 8.21). In 
another embodiment^ the keyphrases from the structured representation of user input could match 
by being synonyms or metonyms of the keyphrases cross-linked to the object node representing 
the keyphrase Shoe #34 (object node 8.21). 

Because the keyphrase Shoe#34 (object node 8.21) is a match it is passed to the output 
10 user interface device as part of a result set that can be displayed as a list. The result set can be 
shown to the user using any computer or displayed over a network. The result set can be 
presented visually, in text or graphic formats, or can be read aloud to the user. The output device 
may also display mformation about the keyphrase Shoe #34 (object node 8.21), alojDg with 
context-appropriate text, such as "How do you like this shoe?" or "This shoe is on sale." 

15 

Difiamhi p ^iating: Natural Language 

The methods and systems of the invention also permit disambiguating a syntactically 
ambiguous natural language statement. Disambiguation comprises the steps of: (a) parsing the 

20 syntactically ambiguous natural language statement into at least two structured representations, 
where th& first structured representation comprises at least one first keyphrase and the second 
structured representation comprises at least one second keyphrase; (b) searching a cross- 
linked keyphrase ontology database for a keyphrase node representing a third keyphrase, where 
third keyphrase matches the first keyphrase or the second keyphrase; (c) if the first keyphrase 

25 matches the third keyphrase and the second keyphrase does not match the third keyphrase, 

designating the first structured representation as a first statement interpretation; (d) if the second 

AS 
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keyphrase matches the third keyphrase and the first keyphrase does not match the thu'd keyphrase, 
designating the second structured representation as a second statement interpretation; and 
(e) if the first keyphrase matches the third keyphrase and the second keyphrase matches the third 
keyphrase or the first keyphrase does not match the third keyphrase and the second keyphrase 
5 does not match the third keyphrase determining that the syntactically ambiguous natural language 
statement cannot be disambiguated. 

The syntactically ambiguous natural language statement may be a query. In one 
embodiment, the third keyphrase is identical to the first keyphrase or the second keyphrase. In 
another embodiment, the third keyphrase is a synonym of the first keyphrase or the second 

10 keyphrase, while in another embodiment the third keyphrase is a metonym of the first keyphrase 
or the second keyphrase. 

Disambiguation may be done on any syntactically ambiguous natural language statement in 
the English language or in any other spoken or written language. 

The method of disambiguation is finther.illustrated in Figure 1 1 which is a flow chart for 

IS that method. Figure 1 1 shows that an ambiguous natural language statement 1 1 .01 is used to 
produce at least two alternative stmctured representations 1 1 .02 and 1 1 .03, each comprising at 
least one keyphrase, both of which are checked 1 1 .04 and 1 1 .05 against a database. If both 
keyphrases (A and B) are present in the database 11 .08 and 1 1 .09, or if neither keyphrase is 
present 1 1 .06 and 1 1 .07, the syntactic ambiguity in the origmal statement cannot be resolved with 

20 this method 11.12 and 11.13. If the first keyphrase (keyphrase A) 11. 02 is present 11. 08, but the . 
second keyphrase (keyphrase B) 1 1 .03 is not present 1 1 .07 in the database, then the first 
keyphrase 1 1 .02 is accepted 1 1 . 10 as the disambiguated interpretation of the statement 11.01. If 
the second keyphrase 11.03 is present 11.09, but the first keyphrase 11.02 is not present 11.06 in 
the database, then the second keyphrase 1 1 .03 is accepted 1 1.1 1 as the disambiguated 

25 interpretation of the statement 11.01. 
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Syntactic rules are language-specific rules which specify word and phrase orders; one 

such rule in English, for ©cample, is that head nouns in prepositional phrases, such as "cheese" in 

the phrase "with cheese,", must be attached to phrases that came before it in a sentence. 

Grammatical rules are language-specific rules governing use of punctuation; one such rule in 

5 English, for example, is that parallel words, such as "mushrooms," "pepperoni," and "cheese" in 

the phrase "with mushrooms, pepperoni, and cheese," must be separated by conmias and/or 

conjunctions. Syntactically and granamatically ambiguous word and phrase attachment and 

reference is common in natural language and poses a major obstacle to language understanding. 

Semantic knowledge is knowledge of word meanings and knowledge of the domains to which the 

10 words refer. Semantic knowledge of "pizza," for example, might include knowledge that the 

potential mgredients of pizza include tomato sauce, cheese, sausage, pepperoni, and mushrooms^ 

among others. 

English speakers understand the possible input sentence, ^1 want a ham and cheese 
sandwich" as a request for one item. Such speakers understand the possible input sentence, "I 

IS want a coffee and cheese sandwich" as a request for two items. The distinction between these 
two sentences is based on semantic knowledge, not syntax: both "ham" and "coffee" are nouns, 
so the two sentences are syntactically idmtical. Speakers know that there is such a thing as a 
sandwich made with ham and cheese, and they know that there is not such a thing as a sandwich 
made in part of cofiee, and these facts guide their interpretations of the two sentences. In a 

20 search for a restaurant, misinterpretation of such an input sentence would lead to erroneous 
keyphrases, and hence to a search failure. '"Ham and cheese sandwich," for example, could 
generate a search for a restaurant cross-linked with the keyphrases "ham" and "cheese sandwich," 
if it were misunderstood, while "coffee and cheese sandwich" could generate a search for an 
object cross-linked with the keyphrase "cofifee sandwich" or "coffee and cheese sandwich," if it 

25 were misunderstood. The natural language understanding device can assign correct keyphrases to 
sentences like these and others which are syntactically ambiguous. The input phrase "coffee and 
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cheese sandwich," for example, would generate the two alternate representations shown in 
Figures 12 and 13, corresponding to different syntactic interpretatioxis. Figure 12 shows a 
structured representation comprising the keyphrases "coffee" and "cheese sandwich." Since the 
representation of the keyphrase "coffee" (keyphrase node 12.01) is not directly linked to the , 
5 representation of the keyphrase "sandwich" (keyphrase node 12. OS), this representation does not 
comprise any keyphrase in which the keyphrase "sandwich" (keyphrase node 12.05) is 
i^tacticaUy modified by the keyphrase "coffee" (keyphrase node 12.01). The structured 
representation shown in Figure 12 corresponds to the semantically correct interpretation of the 
phrase as signij^g two different objects, coffee and a sandwich. 

10 Figure 13 shows a structured representation comprising the keyphrases "coffee sandwich" 

and "choese sandwich," Since the representation ofthe keyphrase "coffee" (keyphrase node 
13.01) is directly linked 13.02 to the representation of the keyphrase "sandwich" (ke)rphrase node 
13.05), this representation does comprise a keyphrase in which "sandwich" (keyphrase node 
13.05) is syntactically modified by the keyphrase "coffee" (keyphrase node 13.01). The 

15 structured representation shown in Figure 13 corresponds to the semantically incorrect 

interpretation of the phrase as signifying one object, "a sandwich made of coffee and of cheese." 
Since the candidate k^hrase "coffee sandwich" will not be represented in the keyphrase domain 
of a cross-linked keyphrase ontolo^ database, while the keyphrases "coffee" and "cheese 
sandwich" might be represented, the method of Figure 1 1 will likely lead to the structured 

20 representation shown in Figure 12 being accepted as the correctly disambiguated interpretation of 
the input phrase "coffee and cheese sandwich." 

Similarly the natural language understanding system disambiguates attachment of 
contiguous modifiers by checldng the keyphrase domain of the cross-linked keyphrase ontology 
database to see if candidate keyphrases exist in that domain. For example, the input phrase 

25 "ItaUan salami sandwich" might refer to an ItaUan sandwich composed of salami (with , 
resulting structured representation shown in Figure 14) or a sandwich made with Italian salami 
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(with the resulting stnictured representation shown in Figure 1 5). In Figure 14, an object node 

14.05 which will match to an object node when the database is searched has an inheritance link 
14.02 with a parent node 14.01 representing the keyphrase "sandwich" (keyphrase node 14.01) 
and receives cross-links 14.03 and 14.06 from nodes representing the keyphrase "Italian" 

5 (keyphrase node 14.04) and the keyphrase "salami" (keyphrase node 14.07). Because the 
representation of the keyphrase "Italian" (keyphrase node 14.04) in Figure 14 is linked, via the 
object node 14.05, with the representation of the keyphrase "sandwich" (keyphrase node 14.01), 
Figure 14 comprises keyphrases in which the keyphrase "sandwich" (keyphrase node 14.01) is 
syntactically modified by the keyphrase "Italian" (keyphrase node 14.04). In Figure 15, an object 
10 node 15.05 which will match to an object node when the database is searched, has an inheritance 
liok 15.02 with a parent node 15.01 representing the keyphrase "sandwich," and a cross-link 

1 5.06 to a keyphrase node 1 5.07 representing the keyphrase "salami," which in turn has a cross- 
link 15.03 to a keyphrase node 15.04 representing the keyphrase "Italian." Since the 
representation of the keyphrase 'Italian" (keyphrase node 15.04) in Figure 1 5 is not directly 

1 5 linked, via the object node 1 5.05, with the representation of the keyphrase "sandwich" O^eyphrase 
node 15.01), Figure 15 does not comprise keyphrases in which the keyphrase "sandwich" 
(keyphrase node 15.01) is syntactically modified by the keyphrase "Italian" (keyphrase node 
15.04). Hence, the natural language understanding system could choose between these two 
structural representations by checking the keyphrase domain for the keyphrase "Italian sandwich." 

20 Failing to find such a keyphrase, and instead finding a keyphrase node representing the keyphrase 
"Italian salami," a keyphrase comprised by the structured representation shown in Figure 15 but 
not by the structured representation shown in Figure 14, might cause the natural language 
understanding system to accept a structured representation of the input phrase like that in Figure 
15 as the correctly disambiguated interpretation of the phrase "Italian salami sandwich." Note, 

25 that if nodes representing neither or both of the keyphrases "Italian sandwich" and "Italian salami" 
can be foimd in the keyphrase domain (i.e., both or neither "sandwich with Italian salami'* and an 
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'Italian sandwich ivith salami" exist), then this method cannot be used to disambiguate the phrase 
""talian salami sandwich. * 

Figure 16 is an illustration of one embodiment of this invention. This embodiment 
includes a user iaterface 16.02 through which users can input queries in written 16.05 or speech 

16.03 form, a spell-checker 16.06, a speech-recognition device 16.04, a natural language 
understanding device 16.07, a word stemmer and normalizer 16.08, a query engine 16. 10, a ciross- 
Unked keyphrase ontology database 16. 1 1, a sentence generator 16.12, a user interface device 
providing responses to users 16:13 and a set of utiUties 16.16. The utilities 16.16 interact with 
the spell-checker 16.06, the natural language xmderstanding device 16.07, the stemmer and 
normalizer 16.08, and the crossJinked keyphrase ontology database 16. 1 1. As shown.Figure 16, 
users can choose to refine 16.15 or not refine 16.14, queries they have previously input 16.01 
based on the system's responses 16. 13 to their initial query. 

As shown in Figure 16, user interaction 16.01 with this invention is initiated firom an input 
device 16.02, which may be a text field, web page, or speech channel, or some other form. The 
cross-linked keyphrase ontology database allows highly reliable natural language keyphrase 
searches with mimmal initial knowledge engineering. Hence, one embodiment of the invention, 
which takes advantage of its various properties, involves user input in the form of natural 
language text or speech. As shown in Figure 16, if user input is written 16.05, a spell-checker 
16.06 is used to norinalize spelling. Jurafsky, et aL, Speech and Language Processing (Upper 
Saddle River, New Jersey: Prentice Hall, 2000) describes known methods of checking spelling, 
using computer devices. If user input is in the form of speech 16.03, a speech recognition device 

16.04 must be used to convert input speech to a text string. Jurafsky, et al.. Speech and 
Z^wgwfl^e iVoc«j/w5^ (Upper Sadde River, New Jers^: Prm^ 

methods of converting speech to text, using computer devices. 

As shown in Figure 16, the text string firom the spell-checker or firom the speech 
recognition device is converted to a structured representation 1 6.09 by the natural language 
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understanding device 16.07 and a stemmer and noimalizer 16.08. Stemming refers to the process 
by which inflected verbs and comparative or superlative adjectives are transformed to their root 
forms and plural noims are singularized. Normalizing is the process of changing various verb 
derivatives (such as "hiker*') to the verb roots, or lemmas, from which they were derived (such as 
"hike"). Normalization may be omitted or not, depending on the natural language understanding 
system used and the care with which the database is constructed. Stemming devices are known 
and many would serve the purpose of this embodiment. 

As shown in Figure 16, the structured representation 16.09, now with stemmed and 
possibly normalized words, is then input to a queiy engine 16. 10, which is a device which serves 
several purposes. First, the query engine takes the steromed and normalized structured 
representation and uses it to search for objects in the cross-lioked keyphrase ontology database 
16.11. If objects with all the required cross-links are found in the database, the queiy engine 
16.10 formats these itemis and passes information about them, and about the structured 
representation 16.09 which comprised its input, to the sentence generator 16.12 and output 
interface 16. 13 devices. If no matching object nodes are found, the query engine 16. 10 can 
truncate or eliminate keyphrases comprised by the structured representation 16.09 to find closest 
matches to input queries 16.01. For example. Figure 17 shows a structured representation 
resulting from the sentence "I want an Italian restaurant with lamb Napoletana." This structured 
representation indicates that the object node being sought 17.03 is linked with nodes representing 
the keyphrases "restaurant" (keyphrase node 17.01), "Italian" (keyphrase node 7.07), and "Iamb 
Napoletana," the last of which results from syntactic modification of "lamb" (keyphrose node 
17.05) by "Napoletana" (keyphrase node 17.09). If no object node linked to nodes representing 
the keyphrases "restaurant,"(keyphrase node 17.01), "Italian" O^eyphrase node 17.07) and "lamb 
Napoletana" is found in the cross-linked keyphrase ontology database, the structured 
representation shown in Figure 17 can be altered in the query engine by truncating of keyphrases 
or parts of multi-word keyphrases. Figure 18, for example, shows the structured representation 
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resulting from truncating the representation 17.07 of keyphrase "Italian" (keyphrase node 17.09) 
from the structured representation shown in Figure 17. The truncated structured representation 
shown in Figure 18 indicates tiiat the object node being sought 18.03 is linked with nodes 
representing the keyphrases, "restaurant" O^^eyphrase node 18.01) and "lamb Napoletana," which 
5 results from syntactic modification of the keyphrase "lamb" (keyphrase node 1 8.0S) by the 

keyphrase '^Napoletana" (keyphrase node 18.09). Alternatively, truncating of the representation 
17.09 of ^'Napoletana" from the truncated structured representation shown in Figure 17 results m 
the structured representation shown in Figure 19. The structured representation shown in Figure 
19 indicates that the object node bdng sought 19.03 is linked with nodes representing the 

10 keyphrases, "restauranf ' (keyphrase node 18.01), "Italian" (keyphrase node 19.07) and "lamb" 
(keyphrase node 19.0S). An object node with an inheritance link from a keyphrase node 
representing "restaurant" and cross-linked to a node representing the keyphrase "lamb 
Napoletana" will nsiatch the structured representation shown in Figure 1 8, while an object node 
with an inheritance link from a keyphrase node representing "restauranf ' and cross-linked to 

15 nodes representing the keyphrases "Italian" and "lamb" will match tiie structured representation 
shown in Figure 19. Going even fiirther, if object nodes like these cannot be found, truncating the 
representations of both keyphrases "Italian" (keyphrase node 17.07) and "Napoletana" (keyphrase . 
node 17.09) from the structured representation shown in Figure 17 will change the search to one 
for an object node with an inheritance link to a keyphrase node representing restaurant and with a 

20 single cross-link to a keyphrase node representing "lamb." 

Whatever search is finally performed, the results are formatted and passed to the sentence 

generator 16. 12 and output user interface 16. 13 device. If truncation has occurred in order to 

avoid an empty result set, the user can be informed, for example, that the closest match is a 

"restaurant with lamb Napoletana," or "Italian restaurant with lamb," or "a restaurant with lamb." 

25 The user can then be ^ven the chance to view such objects: ^ 

39^ 
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The sentence generator 16.12 shown in Figure 16 is a device for creating natural language 

feedback which is displayed or read to the user through the output device 1 6. 13 . The purpose of 

such feedback, in an embodiment, is to keep the user mformed of how the search performed, of 

the results, and of potential problems in query interpretation. To continue the example in the 

previous paragraph, for instance, the sentence generator may produce the following messages 

*TH[ere are several Italian restaurants with lamb," or *Tour request couldn't be fiilly satisfied. The 

closest matches are Italian restaurants, or restaurants with lamb," or other messages, depending 

on the search results. Sentence generation devices are known, and several of these can produce 

the sentences required for this embodiment, given properly formatted information from the query 

engine. Jurafsky, et al.. Speech m7d Language Processing (Upper Saddle River, New Jersey: 

Prentice Hall, 2000) describes some methods of sentence generation. 

Feedback may be given to users via speech, rather than visually. In this case, information 

fi-om the query engine 16.10 and sentence generator 16.12 are passed to a speech synthesis 

device, which converts text strings to spoken speech. Speech synthesis devices are known, and 

several could serve the purpose of this embodiment. Jurafsky, et al.. Speech and Language 

Processing (Upper Saddle River, New Jersey: Prentice Hall, 2000) describes some methods of 

speech synthesis. As shown in Figure 16, this embodiment includes various utility devices 16. 16 

to create, load and maintain the database 16.11, and to log interactions and correct search errors. 

Having described several di£ferent embodiments of the invention, it is not intended that the 
invaation is limited to these embodiments and that modifications and variations may be made by 
one skilled in the art without departing firom the spirit and scope of the invention as defined in the 
claims. 



23 
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WHAT IS CLAIMED IS: 

1 . A method of generating a cross-linked keyphrase ontology database comprising the steps 

of: 

(a) defining at least one keyphrase; 

(b) representing the keyphrase by a keyphrase node in an ontology; 

(c) cross-linking the keyphrase node to at least one second keyphrase node, wherein 
the second keyphrase node represents a second keyphrase in a second ontology; 
and 

(d) repeating steps (b) - (c) for each keyphrase defined in step (a). 

2. The method of claim 1, wherein the keyphrase in step (a) is generated by parsing a text. 

3. The method of claun 1, wherein the keyphrase in step (a) is selected firom a group 
consisting of nouns, adjectives, veibs and adverbs. 

4. The method of daim 1, wherem the keyphrase in step (a) and the second keyphrase have 
at least one word in common. 

5. The method of claim 2, wherein the text is in the English language. 

6. A method of indexmg a retrievable object in a cross-linked keyphrase ontology database 
comprising the steps of: 

(a) representing the retrievable object by an object node in an ontology; and 
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(b) cross-linking the object node to a keyphrase node, wherein the keyphrase node 

represents a keyphrase in a second ontology and the keyphrase is related to the 

retrievable object. 

The method of indexing of claim 6, wherein the keyphrase is detennined by parsing a text 
related to the retrievable object. 

8. The method of indexing of claim 6, wherein the retrievable object is a document 

9. The method of indexing of claim 6, wherein the retrievable object is a web page. 

10. The method of indexing of daim 6, wherein the retrievable object is a pointer. 

1 1 The method of indexing of claim 6, wherein the retrievable object is an executable 
computer program. 

12. The method of searching a cross*lmked keyphrase ontology database comprising the steps 
of: 

(a) parsing a natural language statement into a structured representation, wherdn the 
structured representation comprises at least one keyphrase; 

(b) searching the cross-linked keyphrase ontology database for at least one object 
node, wherein the object node is cross-linked to a keyphrase node representing a 
second keyphrase, wherein the second keyphrase matches the keyphrase parsed in 
step (a); and 

(c) defining a search result as a retrievable object, wherein the retrievable object is 
represented by the object node. 

3^ 
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13. The method of searching of claim 12, wherein the search result is displayed to a user in a 
list 

14. The method of searching of claim 12, wherein the retiievable object is an executable 
computer program; 

15. The method of searching of claim 12, wherein the natural language statement Is a query. 

1 6. The method of searching of claim 12, wherein the keyphrase in step (a) and the second 
keyphrase are identical. 

17. The method of searching of claim 12, wherein the keyphrase in step (a) and the second 
keyphrase are synonyms. 

18. The method of searching of claim 12, wherem the keyphrase in step (a) and the second 
keyphrase are metonyms. 

19. The method of searching of claim 12, wherein the natural language statement is in the 
English language. 

20. A method of disambiguating a syntactically ambiguous natural language statement 
comprising the steps of: 

(a) parsmg the syntactically ambiguous natural language statement into at least two 
structured representations, wherein the first structured representation comprises at 
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least one first keyphrase and the second structured representation comprises at 

least one second keyphrase; 

(b) searching a cross-Imked keyphrase ontology database for a keyphrase node 
representing a third keyphrase, wherein the third keyphrase matches the first 

5 keyphrase or the second keyphrase; 

(c) if the first keyphrase matches the third keyphrase and the second keyphrase does 
not match the third keyphrase, designating the first structured representation as a 
first disambiguated statement interpretation; 

(d) if the second keyphrase matches the third keyphrase and the first keyphrase does 
10 not match the third keyphrase, designating the second structured representation as 

a second disambiguated statement interpretation; and 

(e) if the first keyphrase matches the third keyphrase and the second keyphrase 
matches the third keyphrase or the first keyphrase does not match the third 
keyphrase and the second keyphrase does not match the third keyphrase, 

1 S determining that the syntactically ambiguous natural language statement cannot be 

disambiguated. 

21 . The method of disambiguation of claim 20, wherein the syntactically ambiguous natural 
language statement is a query. 

20 

22. The method of disambiguating of claim 20, wherein the third keyphrase is identical to the 
first keyphrase or the second keyphrase. 

23. The method of disambiguating of claim 20, wherein the third keyphrase is a synonym of 

25 the first keyphrase or the second keyphrase. 

37 
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i 

The method of disambiguating of claim 20, wherein the third keyphrase is a metonym of 
the first keyphrase or the second keyphrase. 



25. The method of disambijguating of claim 20, wherein the syntactically ambiguous natural 
language statement is in the EngUsh language. 
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